Tree Annotation Tool using Two-phase Parsing to Reduce Manual Effort for Building a Treebank

نویسندگان

  • So-Young Park
  • Yongjoo Cho
  • Sunghoon Son
  • Ui-Sung Song
  • Hae-Chang Rim
چکیده

In this paper, we propose a tree annotation tool using a parser in order to build a treebank. For the purpose of minimizing manual effort without any modification of the parser, it performs twophase parsing for the intra-structure of each segment and the inter-structure after segmenting a sentence. Experimental results show that it can reduce manual effort about 24.5% as compared with a tree annotation tool without segmentation because an annotation’s intervention related to cancellation and reconstruction remarkably decrease although it requires the annotator to segment some long sentence.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interactive Predictive Parsing Framework for the Spanish Language

The Interactive Predictive Parsing (IPP) framework allows us the construction of interactive tree annotation systems. These can help human annotators in creating error-free parse trees with little effort (compared to manually post-editing the trees obtained from a completely automatic parser). In this paper we adapt the IPP framework and the IPP-Ann annotation tool for parse of the Spanish lang...

متن کامل

Transformed Subcategorization Frames in Chunk Parsing

This paper describes an approach to treebank development which relies on the manual development of annotation tools. The overall process of tree annotation is described, and a special emphasis is put on the description of the last tool which has been built, i.e. a dependency-based robust chunk parser. The modularization of the parser and the central role of verbal subcategorization is presented...

متن کامل

TamilTB: An Effort Towards Building a Dependency Treebank for Tamil

Annotated corpora such as treebanks are important for the development of parsers, language applications as well as understanding of the language itself. Only very few languages possess these scarce resources. In this paper, we describe our effort in syntactically annotating a small corpora (600 sentences) of Tamil language. Our annotation is similar to Prague Dependency Treebank (PDT 2.0) and c...

متن کامل

A Machine Learning Approach to Automatic Functor Assignment in the Prague Dependency Treebank

The aim of this paper is to describe and evaluate a system that automates a part of the transition from analytical to tectogrammatical tree structures within the Prague Dependency Treebank. In particular, it assigns functors to autosemantic words. The system is based on the machine learning approach of decision tree induction. The resulting software tool is incorporated into the annotation proc...

متن کامل

Utilizing State-of-the-art Parsers to Diagnose Problems in Treebank Annotation for a Less Resourced Language

The recent success of statistical parsing methods has made treebanks become important resources for building good parsers. However, constructing highquality annotated treebanks is a challenging task. We utilized two publicly available parsers, Berkeley and MST parsers, for feedback on improving the quality of part-of-speech tagging for the Vietnamese Treebank. Analysis of the treebank and parsi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005